Masked decomposition of polynomials for lattice-based cryptography

ABSTRACT

Various implementations relate to a data processing system comprising instructions embodied in a non-transitory computer readable medium, the instructions for a cryptographic operation including a masked decomposition of a polynomial a having ns arithmetic shares into a high part a1 and a low part a0 for lattice-based cryptography in a processor, the instructions, including: performing a rounded Euclidian division of the polynomial a by a base α to compute t(⋅)A; extracting Boolean shares a1(⋅)B from n low bits of t by performing an arithmetic share to Boolean share (A2B) conversion on t(⋅)A and performing an AND with ζ−1, where ζ=−α−1 is a power of 2; unmasking a1 by combining Boolean shares of a1(⋅)B; calculating arithmetic shares a0(⋅)A of the low part a0; and performing a cryptographic function using a1 and a0(⋅)A.

TECHNICAL FIELD

Various exemplary embodiments disclosed herein relate generally toefficient and masked decomposition of polynomials for lattice-basedcryptography.

Various exemplary embodiments disclosed herein relate generally toefficient and masked decomposition of polynomials for lattice-basedcryptography.

BACKGROUND

Recent significant advances in quantum computing have accelerated theresearch into post-quantum cryptography schemes: cryptographicalgorithms which run on classical computers but are believed to be stillsecure even when faced with an adversary with access to a quantumcomputer. This demand is driven by interest from standardization bodies,such as the call for proposals for new public-key cryptography standardsby the National Institute of Standards and Technology (NIST). Theselection procedure for this new cryptographic standard has started andhas further accelerated the research of post-quantum cryptographyschemes.

There are various families of problems to instantiate these post-quantumcryptographic approaches. Constructions based on the hardness of latticeproblems are considered to be promising candidates to become the nextstandard. A subset of approaches considered within this family areinstantiations of the Learning With Errors (LWE) framework: theRing-Learning With Errors problem. One of the leading lattice-basedsignature schemes is Dilithium which requires operations involvingarithmetic with polynomials with integer coefficients. When implemented,the main computationally expensive operations are the arithmetic withpolynomials. More precisely, computations are done in a ring R_(q)=(

/q

)[X]/(F): the ring where polynomial coefficients are in

/q

and the polynomial arithmetic is performed modulo a polynomial F.

SUMMARY

A summary of various exemplary embodiments is presented below. Somesimplifications and omissions may be made in the following summary,which is intended to highlight and introduce some aspects of the variousexemplary embodiments, but not to limit the scope of the invention.Detailed descriptions of an exemplary embodiment adequate to allow thoseof ordinary skill in the art to make and use the inventive concepts willfollow in later sections.

Various embodiments relate to a data processing system comprisinginstructions embodied in a non-transitory computer readable medium, theinstructions for a cryptographic operation including a maskeddecomposition of a polynomial a having n_(s) arithmetic shares into ahigh part a₁ and a low part a₀ for lattice-based cryptography in aprocessor, the instructions, including: performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A); extractingBoolean shares a₁ ^((⋅)B) from n low bits of t by performing anarithmetic share to Boolean share (A2B) conversion on t^((⋅)A) andperforming an AND with ζ−1, where ζ=−α⁻¹ is a power of 2; unmasking a₁by combining Boolean shares of a₁ ^((⋅)B); calculating arithmetic sharesa₀ ^((⋅)A) of the low part a₀; and performing a cryptographic functionusing a₁ and a₀ ^((⋅)A).

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes adding α/2 to a^((⋅)A) and dividing by α.

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes calculating: t^((⋅)A)=a^((⋅)A)+γ; and t^((⋅)A)=α⁻¹×t^((⋅)A)−(qmod ζ), where γ=α/2 and q is a prime modulus.

Various embodiments are described, wherein calculating arithmetic sharesa₀ ^((⋅)A) of the low part a₀ includes: calculating u^((⋅)A) bysubtracting a₁ from t^((⋅)A) and adding q mod ζ, where q is a primemodulus; and multiplying u^((⋅)A) by α and then subtracting α/2.

Further various embodiments relate to a data processing systemcomprising instructions embodied in a non-transitory computer readablemedium, the instructions for a cryptographic operation including amasked decomposition of a polynomial a having n_(s) arithmetic sharesinto a high part a₁ and a low part a₀ for lattice-based cryptography ina processor, the instructions, including: performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A); extractingBoolean shares a₁ ^((⋅)B) from n low bits of t by performing anarithmetic share to Boolean share (A2B) conversion on t^((⋅)A) andperforming an AND with ζ−1, where ζ=−α⁻¹ is a power of 2; unmasking a₁by combining Boolean shares of a₁ ^((⋅)B); calculating the Booleanshares a₀ ^((⋅)B) of the low part a₀; and performing a cryptographicfunction using a₁ and a₀ ^((⋅)B).

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes adding α/2 to a^((⋅)A) and dividing by α.

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes calculating: t^((⋅)A)=a^((⋅)A)+γ; and t^((⋅)A)=α⁻¹×t^((⋅)A)−(qmod ζ), where γ=α/2 and q is a prime modulus.

Various embodiments are described, wherein calculating the Booleanshares a₀ ^((⋅)B) of the low part a₀ includes: shifting t^((⋅)B) n bitsto the right, where n is a number of bits in ζ; and calculating a₀^((⋅)B)=SecAdd(((γ+(q mod ζ))^((⋅)B), ¬t^((⋅)B)) where γ=α/2 and q is aprime modulus.

Further various embodiments relate to a data processing systemcomprising instructions embodied in a non-transitory computer readablemedium, the instructions for a cryptographic operation including amasked decomposition of a polynomial a having n_(s) arithmetic sharesinto a high part a₁ and a low part a₀ for lattice-based cryptography ina processor, the instructions, including: performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A); extractingBoolean shares a₁ ^((⋅)B) from n low bits of t by performing anarithmetic share to Boolean share (A2B) conversion on t^((⋅)A) toproduce t^((⋅)B) and performing a Boolean share to arithmetic share(B2A) conversion on t^((⋅)B), where ζ=−α⁻¹; unmasking a₁ by combiningarithmetic shares of a₁ ^((⋅)A); calculating the arithmetic shares a₀^((⋅)A) of the low part a₀; and performing a cryptographic functionusing a₁ and a₀ ^((⋅)A).

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes adding α/2 to a^((⋅)A) and dividing by α.

Various embodiments are described, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes calculating: t^((⋅)A)=a^((⋅)A)+γ; and t^((⋅)A)=α⁻¹×t^((⋅)A)−(qmod ζ), where γ=α/2 and q is a prime modulus.

Various embodiments are described, wherein calculating the arithmeticshares a₀ ^((⋅)A) of the low part a₀ includes: calculating u^((⋅)A) bysubtracting a₁ from t^((⋅)A) and adding q mod ζ, where q is a primemodulus; and multiplying u^((⋅)A) by α and then subtracting α/2.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to better understand various exemplary embodiments, referenceis made to the accompanying drawings, wherein:

FIG. 1 constitutes, in some respects, an abstraction and that the actualorganization of the components of the device may be more complex thanillustrated.

To facilitate understanding, identical reference numerals have been usedto designate elements having substantially the same or similar structureand/or substantially the same or similar function.

DETAILED DESCRIPTION

The description and drawings illustrate the principles of the invention.It will thus be appreciated that those skilled in the art will be ableto devise various arrangements that, although not explicitly describedor shown herein, embody the principles of the invention and are includedwithin its scope. Furthermore, all examples recited herein areprincipally intended expressly to be for pedagogical purposes to aid thereader in understanding the principles of the invention and the conceptscontributed by the inventor(s) to furthering the art and are to beconstrued as being without limitation to such specifically recitedexamples and conditions. Additionally, the term, “or,” as used herein,refers to a non-exclusive or (i.e., and/or), unless otherwise indicated(e.g., “or else” or “or in the alternative”). Also, the variousembodiments described herein are not necessarily mutually exclusive, assome embodiments can be combined with one or more other embodiments toform new embodiments.

Some post-quantum cryptography schemes require a decomposition or atruncation-like operation of polynomial coefficients. For unprotectedimplementations, this can be straightforwardly realized via differentalgorithms, mainly involving an Euclidean division or a simpletruncation for power of two base and modulus. One family of attacks,so-called side-channel analysis, exploits data dependencies in physicalmeasurements of the target device (e.g., power consumption) to recoversecret keys and can be thwarted with the help of masking the processeddata. The decomposition operation requires protection since its inputand outputs depend on the secret key. However, previous orstraightforward techniques for masked decomposition introduce asignificant performance overhead for non power of two base and modulus.In this disclosure, a new approach is presented to perform maskeddecomposition securely and efficiently for non power of 2 moduli.

The signing operation of a digital signature scheme generates asignature for a given message using a secret key. If this secret key wasto be leaked, it would invalidate the security properties provided bythe scheme. It has been shown that unprotected implementations ofpost-quantum signature schemes are vulnerable to implementation attacks,e.g., side-channel analysis. In particular, it was demonstrated that thesecret key may be extracted from physical measurements of key-dependentparts in the signing operation.

For Dilithium, the key-dependent operations include the decomposition ofpolynomials in a base α. Concretely, for a coefficient a∈

/q

, the decompose operation computes the high part a₁ and the low part a₀such that a mod

${q = {{{a_{1} \times \alpha} + {a_{0}{with}} - \frac{\alpha}{2}} < a_{0} \leq \frac{\alpha}{2}}},$

except if

$a_{1} = \frac{q - 1}{\alpha}$

where a₁ is set to 0 and a₀=(a mod q)−q, with

${- \frac{\alpha}{2}} \leq a_{0} < {0.}$

The possible values for the decomposition base α are

$\frac{q - 1}{16}{and}{\frac{q - 1}{44}.}$

Additionally, the parameter γ is defined such that α=2γ. While thedecomposition operation is trivial in the unmasked case, a secureimplementation of this digital signature scheme requires the integrationof dedicated countermeasures for this step.

Masking is a common countermeasure to thwart side-channel analysis andhas been utilized for various applications. Besides security, efficiencyis also an important aspect when designing a masked algorithm. Importantmetrics for software implementations of masking are the number ofoperations and the number of fresh random elements required for themasking scheme.

The first masking approach for Dilithium was proposed in VincentMigliore, Benoît Gérard, Mehdi Tibouchi, and Pierre-Alain Fouque,Masking dilithium-efficient implementation and side-channel evaluation,Applied Cryptography and Network Security—17th International Conference,ACNS 2019, Bogota, Colombia, Jun. 5-7, 2019, Proceedings (Robert H.Deng, Valérie Gauthier-Umaña, hoa, and Moti Yung, eds.), Lecture Notesin Computer Science, vol. 11464, Springer, 2019, pp. 344-362 (Migliore).In Migliore, the decomposition operation for prime modulus is performedusing multiple arithmetic additions modulus q over Boolean shares. Ittakes as input an arithmetic sharing of coefficients and producesBoolean-shared decompositions.

Similar lattice-based signature schemes to Dilithium include GLP andqTESLA. The first dedicated masking of GLP was presented in GillesBarthe, Sonia Belaïd, Thomas Espitau, Pierre-Alain Fouque, BenjaminGrégoire, Mélissa Rossi, and Mehdi Tibouchi, Masking the GLPlattice-based signature scheme at any order, Advances inCryptology—EUROCRYPT 2018-37th Annual International Conference on theTheory and Applications of Cryptographic Techniques, Tel Aviv, Israel,Apr. 29-May 3, 2018 Proceedings, Part II (Jesper Buus Nielsen andVincent Rijmen, eds.), Lecture Notes in Computer Science, vol. 10821,Springer, 2018, pp. 354-384 (Barthe). GLP does not require adecomposition operation similar to the one needed for Dilithium. qTESLAonly requires a rounding operation by a power of 2. In François Gérardand Mélissa Rossi, An efficient and provable masked implementation ofqtesla, Smart Card Research and Advanced Applications—18th InternationalConference, CARDIS 2019, Prague, Czech Republic, Nov. 11-13, 2019,Revised (Sonia Belaïd and Tim Güneysu, eds.), Lecture Notes in ComputerScience, vol. 11833, Springer, 2019, pp. 74-91 (Gérard), the authorsextend on Barthe to mask the signature scheme qTESLA, but modify theoriginal parameters of the scheme by changing the prime modulus to apower of two for simpler masking of the rounding operation.

Decomposition methods are disclosed herein that improves on thestate-of-the-art enabling significantly more efficient implementationsof post quantum cryptography (PQC) schemes requiring a decompositionoperation with non power of 2 moduli. An example of such a PQC scheme isDilithium that includes the decomposition of secret polynomialcoefficients by the base α. The decomposition methods disclosed hereinimproves both the number of operations and the number of random elementsrequired.

SecDecomposeOriginal (or Algorithm 12 Decompose in Migliore where thename is adapted for better readability) computes a₀ first as thedivision remainder of a by α and then ensures that a₀ is in the requiredrange by the Decompose function specifications. The next step ofSecDecomposeOriginal is to compute a₁ as

$\frac{\left( {a - a_{0}} \right)}{\alpha}.$

The last step of the algorithm is to evaluate the specific case wherea−a₀=q−1. SecDecomposeOriginal begins by converting the input fromarithmetic to Boolean shares and then exclusively operates on theBoolean shares.

SecDecomposeOriginal includes the following drawbacks. First, becausethe output a₀ of the decomposition function is the input to an addition,an implementation of Dilithium using SecDecomposeOriginal requires anadditional B2A conversion in order to perform the addition efficientlyor alternatively performs this addition on Boolean shares. Both optionsare not efficient because both a B2A conversion and an addition onBoolean shares are expensive operations. Second, in SecDecomposeOriginalthe computation of a₀ and a₁ requires many calls to SecAdd (i.e.,arithmetic addition over Boolean shares) which is a particularlyexpensive operation on Boolean shares. Third, in SecDecomposeOriginalalmost half of the operations are used only to ensure the range on a₀and to cater to the specific case where a−a₀=q−1 (in particular thispart of the algorithm includes one call to SecAnd and two calls toSecAdd).

As a result, for NIST level 5 for one Dilithium signing iteration andd=5, SecDecomposeOriginal takes about 24.2 million operations and 100.3million random bits. The proposed SecDecomposeA2Apow2 takes 8 millionoperations and 38 million random bits for the same task. For NIST level3 taking into account the average number of signing iterations (=5.1)and for d=5, SecDecomposeOriginal takes 92.6 million operations and383.9 million random bits. The proposed SecDecomposeA2Apow2 requiresonly 30.8 million operations and 145.3 million random bits incomparison.

The functions SecDecomposeA2Apow2 and SecDecomposeA2Bpow2 provideefficient decomposition when the opposite of the base's inverse is apower of two. The first function SecDecomposeA2Apow2 circumvents thefirst drawback of SecDecomposeOriginal by providing a₀ in arithmeticshares. The second function SecDecomposeA2Bpow2 provides a moreefficient approach to SecDecomposeOriginal while still outputting aBoolean-shared a₀, when this is needed for certain implementations. Thefunction SecDecomposeA2Anotpow2 provides an efficient alternative forarbitrary decomposition bases (when the opposite of the base's inverseis not a power of 2). All the functions make use of the observation thatonce a₁ is computed, a₁ may be unmasked because its value may be easilyrecovered from a public signature and the public key.

The improvements of the decomposition functions disclosed herein arebased on computing a₁ first using a rounded up division by α. As opposedto SecDecomposeOriginal, which operates only on Boolean shares, this isefficiently performed on the input arithmetic shares of a by adding

$\gamma = \frac{\alpha}{2}$

and multiplying by the inverse of α modulus q. This allows for aconversion to Boolean shares much later and only when necessary.

The disclosed decomposition functions with rounded division also allowsfor the computation of the value of a₁ that does not require any cornercase evaluation/correction and also results implicitly in the correctrange for a₀ when it is computed from a and the computed a₁. FunctionsSecDecomposeA2Apow2 and SecDecomposeA2Bpow2 require one A2B conversionto perform the following Λ operation efficiently. SecDecomposeA2Apow2does not require any other expensive operation on arithmetic or Booleanshares. SecDecomposeA2Bpow2 requires a single addition on Booleanshares. SecDecomposeA2Anotpow2 performs an efficient decomposition fornon power of 2 base and uses one A2B conversion and one B2A conversion.

The disclosed decomposition functions may be applied for arbitrary nonpower of 2 moduli q and arbitrary decomposition base α. Multipledecomposition function versions are provided: namely when −α⁻¹ is apower of two and when it is not. All masked operations on arithmeticshares are naturally performed modulus q. The annotation mod q showswhere a modular reduction has to be performed for the unmasked versionsof the algorithms. If an operation is not annotated, the reduction iseither explicit or not required.

An application of the disclosed decomposition functions may include thesigning process of Dilithium where the coefficients of a secret vector ware decomposed. Concretely, each coefficient a∈

/q

(with q=2²³−2¹³+1) of w is decomposed into its high and low parts a₁ anda₀ such that a=a₁×α+a₀. The values of a₁ and a₀ are such that

${{- \frac{\alpha}{2}} < a_{0} \leq \frac{\alpha}{2}},$

except if

$a_{1} = \frac{q - 1}{\alpha}$

where a₁ is set to 0 and a₀=(a mod q)−q, with

${- \frac{\alpha}{2}} \leq a_{0} < {0.}$

The possible values for the decomposition base α are

$\frac{q - 1}{16}{and}{\frac{q - 1}{44}.}$

Additionally, the parameter γ is defined such that α=2γ.

The coefficients of w must remain secret to ensure the security of thesignature scheme. A common approach is to split up the sensitive valuesinto Boolean or arithmetic shares. A Boolean or arithmetically maskedvariable x as x^((⋅)B) or x^((⋅)A) may be denoted respectively, with⊕_(i=0) ^(n) ^(s) ⁻¹x^((i)) ^(B) =x or Σ_(i=0) ^(n) ^(s) ⁻¹x^((i)) ^(A)=x mod q respectively, (n_(s) being the number of shares). Also notethat for protected implementations, the input to the decompositionfunction is arithmetically shared (because it is the output of amultiplication).

The new disclosed decomposition functions SecDecomposeA2Apow2,SecDecomposeA2Bpow2, and SecDecomposeA2Anotpow2 will now be described.

The function SecDecomposeA2Apow2 is dedicated to the case where ζ=−α⁻¹is a power of 2. Let n be the number of bits in ζ. Below pseudocode isused to describe SecDecomposeA2Apow2. SecDecomposeA2Apow2 outputs a₀ inarithmetic shares and a₁ is unshared. The first step ofSecDecomposeA2Apow2 is to compute a₁ as the rounded Euclidean divisionof a by α as illustrated in Lines 1-2. The rounding is done to thenearest integer. This is achieved by first adding

$\frac{\alpha}{2} = \gamma$

in Line 1 and then dividing by α in Line 2. Because the operations areperformed on arithmetic shares modulus q the division by a may beperformed very efficiently by multiplying with its multiplicativeinverse α⁻¹ modulus q. Then, a₁ is extracted byα⁻¹×t=α⁻¹×(α×a₁+a₀+γ)=a₁+ζ(−a₀−γ). Then before extracting a₁ from α⁻¹×tas the n low order bits in Line 2, the low order bits of q need to beaccounted for because the positive representation of a₁+ζ(−a₀−γ) isq+a₁+ζ(−a₀−γ). The influence of the n low order bits of q are removed bysubtracting q mod ζ at Line 2. Then, a₁ is extracted as the n low bitsof t by performing an A2B conversion at Line 3, an AND with ζ−1 at Line4, and unmasking a₁ at Line 5 by recombining the Boolean shares. Next,at Line 6, u is computed by subtracting a₁ and undoing the subtractionof q mod ζ, i.e.:

u=t−a ₁+(q mod ζ)=a ₁+α⁻¹×(a ₀+γ)−1−a ₁+1=−16×(a ₀+γ).  (1)

Lines 7 and 8 involve dividing u by α⁻¹ which is equivalent tomultiplying by α, then subtracting γ to recover a₀.

Function SecDecomposeA2Apow2(a^((·)) ^(A) ) Input: An arithmetic sharinga^((·)) ^(A) of a coefficient. Output: A decomposition a₁, a₀ ^((·))^(A) of a^((·)) ^(A) . 1: t^((·)) ^(A) = a^((·)) ^(A) + γ 2: t^((·))^(A) = a⁻¹ × t^((·)) ^(A) − (q mod ζ)

mod q 3: t^((·)) ^(B) = A2B(t^((·)) ^(A) ) 4: a₁ ^((·)) ^(B) = t^((·))^(B) ∧ (ζ − 1) 5: a₁ = ⊕_(i=0) ^(n) ^(s) ⁻¹ a₁ ^((i)) ^(B)

unmasking of a₁ 6: u^((·)) ^(A) = t^((·)) ^(A) − a₁ + (q mod ζ) 7:u^((·)) ^(A) = a × u^((·)) ^(A)

mod q 8: a₀ ^((·)) ^(A) = u^((·)) ^(A) − γ 9: return a₁, a₀ ^((·)) ^(A)

A function SecDecomposeA2Bpow2 illustrated in pseudocode below presentsan alternative to the SecDecomposeA2Apow2 function when Boolean sharesare required for the masking of a₀. a₁ is unshared as in theSecDecomposeA2Apow2 function. First, the same rounded to nearest integerEuclidean division as in the SecDecomposeA2Apow2 function is used toextract a₁ at Lines 1-5. Then all the previous operations performed onthe arithmetic shares are replaced by Boolean operations thatessentially perform the same operations. First, instead of subtracting(a₁−1) and multiplying by a (SecDecomposeA2Apow2, Lines 6-7), instead tis shifted by n bits to the right at line 6 that is equivalent todividing by ζ or multiplying by α. Finally, the subtraction γ−t isperformed at Line 7 using t's two's complement.

Function SecDecomposeA2Bpow2(a^((·)) ^(A) ) Input: An arithmetic sharinga^((·)) ^(A) of a coefficient. Output: A Boolean decomposition a₁, a₀^((·)) ^(B) of a^((·)) ^(A) . 1: t^((·)) ^(A) = a^((·)) ^(A) + γ 2:t^((·)) ^(A) = a⁻¹ × t^((·)) ^(A) − (q mod ζ)

mod q 3: t^((·)) ^(B) = A2B(t^((·)) ^(A) ) 4: a₁ ^((·)) ^(B) = t^((·))^(B) ∧ (ζ − 1) 5: a₁ = ⊕_(i=0) ^(n) ^(s) ⁻¹ a₁ ^((i)) ^(B)

unmasking of a₁ 6: t^((·)) ^(B) = t^((·)) ^(B) >> n 7: a₀ ^((·)) ^(B) =SecAdd(((γ + (q mod ζ))^((·)) ^(B) ,¬t^((·)) ^(B) ) 8: return a₁, a₀^((·)) ^(B)

A function SecDecomposeA2Anotpow2 is directed to cases where ζ is not apower of 2. It essentially performs the same operations asSecDecomposeA2Apow2. However, in this case, the AND on Line 4 inSecDecomposeA2Apow2 cannot be performed to extract the bits of a₁because ζ is not a power of 2. Instead, a modular reduction using a B2Aconversion modulus ζ is performed. At Line 4 a₁ ^((⋅)A) is extractedfrom t^((⋅)B) using a B2A function. Then at Line 5 a₁ is unmasked. ThenLines 6-9 are identical to those in SecDecomposeA2Apow2.

Function SecDecomposeA2Anotpow2(a^((·)) ^(A) ) Input: An arithmeticsharing a^((·)) ^(A) of a coefficient. Output: A decomposition a₁, a₀^((·)) ^(A) of a^((·)) ^(A) . 1: t^((·)) ^(A) = a^((·)) ^(A) + γ 2:t^((·)) ^(A) = a⁻¹ × t^((·)) ^(A) − (q mod ζ)

mod q 3: t^((·)) ^(B) = A2B(t^((·)) ^(A) ) 4: a₁ ^((·)) ^(A) =B2A_(ζ)(t^((·)) ^(B) ) 5: a₁ = Σ_(i=0) ^(n) ^(s) ⁻¹ a₁ ^((i)) ^(A) mod q

unmasking of a₁ 6: u^((·)) ^(A) = t^((·)) ^(A) − a₁ + (q mod ζ) 7:u^((·)) ^(A) = a × u^((·)) ^(A)

mod q 8: a₀ ^((·)) ^(A) = u^((·)) ^(A) − γ 9: return a₁, a₀ ^((·)) ^(A)

A description of auxiliary variables and functions used in the functionsdescribed herein is now provided.

-   -   q: modulus. It is equal to q=2²³−2¹³+1 for Dilithium.    -   γ: low-order rounding range. It is equal to

$\frac{\left( {q - 1} \right)}{32}$

for Dilithium's NIST security levels 3 and

-   -   5, and equal to

$\frac{\left( {q - 1} \right)}{88}$

for NIST security level 2.

-   -   α: decomposition base. It is equal to 2γ.    -   α⁻¹: multiplicative inverse of α.    -   ζ: additive inverse of α⁻¹.    -   n: number of bits in ζ if ζ is a power of 2 i.e. ζ=2^(n).    -   n_(s): The number of Boolean or arithmetic shares used in the        sharing of the secret coefficient. Increasing this value will        improve the side-channel security, but also lower the        performance of the algorithm.    -   A2B: This function converts n_(s) arithmetic shares x^((⋅)A)∈        _(q) ^(n) ^(s) to n_(s) Boolean shares x^((⋅)B)∈        ₂ _(ω) ^(n) ^(s) , which encode the same secret coefficient x∈        _(q).    -   B2A_(ζ): This function converts n_(s) Boolean shares x^((⋅)B)∈        ₂ _(ω) ^(n) ^(s) to n_(s) arithmetic shares x^((⋅)A)∈        _(ζ) ^(n) ^(S) , which encode the value of the secret        coefficient modulus ζ.    -   Λ: The function computes the bit-wise AND of two inputs. In a        Boolean masking context, if one of the inputs is a constant or a        public value, the Λ operation is applied on each share of the        other input independently.    -   >>: The function computes the bit-wise right shift of the input        bitstring. In a Boolean masking context, the >> operation is        applied on each share of the other input independently.    -   ¬: The function computes the bit-wise negation of the input        bitstring. When applied to a Boolean-shared input only one share        has to be negated, because ¬(P⊕Q)=(¬P)⊕Q.    -   +/−: The function computes arithmetic addition or subtraction of        the inputs. When applied to one arithmetically-shared input and        one public input only one share has to be included in the        addition or subtraction, because (P+Q)+R=P+(Q+R).    -   ×: The function computes arithmetic multiplication of the        inputs. In an arithmetic masking context, if one of the inputs        is a constant or a public value, the × operation is applied on        each share of the other input independently.

The disclosed decomposition functions will now be compared toSecDecomposeOriginal. One of the disadvantages of usingSecDecomposeOriginal is that it requires 138n_(s) random bits to maskconstant public values. The fact that these values are masked alsoincreases the number of operations because these masked values areinputs to additions over Boolean shares which are costly operations.

A comparison of the number of operations required forSecDecomposeA2Apow2 and SecDecomposeA2Bpow2 will now be made. Let n_(s)be the number of Boolean or arithmetic shares, ω be the word length ofthe processor (usually 32 or 64 bit), and l=log ω−1. Additionally,define n_(ƒ) to be the number of operations for a function ƒ.

The number of operations required for the elementary functions areprovided in Table 1 and the number of operations required forSecDecomposeA2Apow2, SecDecomposeA2Bpow2 and SecDecomposeOriginal areprovided in Table 2. The cost of the B2A conversion required for a₀ inDilithium following SecDecomposeOriginal and SecDecomposeA2Bpow2 are notincluded.

TABLE 1 Number of operations for different functions. Function Number ofOperations +, −, ¬, ⊕ 1 ∧, ×, >> n_(s) A2B${n_{s}^{3}\left( {{\frac{17}{2}l} + 7} \right)} - {n_{s}^{2}\left( {{12l} + 4} \right)} + {n_{s}\left( {{\frac{11}{2}l} - 3} \right)} - {2l}$SecAnd $\frac{{7n_{5}^{2}} - {5n_{s}}}{2}$ SecOr n_(SecAnd) + 3 SecAddl(2n_(SecAnd) + 3n_(s) + 2) + 2n_(SecAnd) + 6n_(s)

TABLE 2 Number of operations for SecDecomposeA2Apow2,SecDecomposeA2Bpow2 and SecDecomposeOriginal. Function Number ofOperations SecDecomposeA2A n_(A2B) + 4n_(s) + 4 SecDecomposeA2Bn_(A2B) + n_(SecAdd) + 4n_(s) + 2 SecDecomposeOriginal n_(A2B) +9n_(SecAdd) + 3n_(SecAnd) + 11n_(s) + 6

Next, the number of random bits that need to be generated in theoriginal algorithm versus the new decomposition functions are compared.SecDecomposeOriginal generates a lot of random bits to mask publicvalues in order to perform masked additions straightforwardly. InSecDecomposeA2Apow2 no masking of public values is required and onlySecDecomposeA2Bpow2 requires randomness to mask the constant publicvalue γ+1 to perform a masked addition.

The comparative results are summarized in Tables 3 and 4. There, r_(ƒ)denotes the number of random bits for a function ƒ.

TABLE 3 Number of random bits for different functions. Function Numberof Random Bits SecAnd, SecOr$\frac{\omega}{2}\left( {n_{s}^{2} - n_{s}} \right)$ SecAdd ω(l +1)(n_(s) ² − n_(s)) A2B${n_{s}^{3}{\omega\left( {{\frac{3}{2}l} + 1} \right)}} - {n_{s}^{2}{\omega\left( {{3l} + 1} \right)}} + {\frac{3}{2}ln_{s}\omega}$

TABLE 4 Number of random bits for SecDecomposeA2Apow2,SecDecomposeA2Bpow2 and SecDecomposeOriginal. Function Number of RandomBits SecDecomposeA2Apow2 r_(A2B) SecDecomposeA2Bpow2 r_(A2B) +r_(SecAdd) + 23n_(s) SecDecomposeOriginal r_(A2B) + 9r_(SecAdd) +3r_(SecAnd) + 138n_(s)

The calculations illustrated in Tables 1-4 illustrate that the discloseddecomposition functions reduce the number of operations and random bitsneeded to perform the decomposition function.

The countermeasures that result from using the implementation of thedecomposition functions disclosed herein provide a technologicaladvantage over the prior art that requires fewer calculations and thegeneration of fewer random bits than prior implementations. This willallow for lattice based post-quantum cryptography schemes to beimplemented in more applications that have limited processing resources.

FIG. 1 illustrates an exemplary hardware diagram 100 for implementingmasked polynomial decomposition by using the functionsSecDecomposeA2Apow2, SecDecomposeA2Bpow2, and SecDecomposeA2Anotpow2. Asillustrated, the device 100 includes a processor 120, memory 130, userinterface 140, network interface 150, and storage 160 interconnected viaone or more system buses 110. It will be understood that FIG. 1illustrates an exemplary hardware diagram for implementing maskedpolynomial decomposition by using the functions SecDecomposeA2Apow2,SecDecomposeA2Bpow2, and SecDecomposeA2Anotpow2.

The processor 120 may be any hardware device capable of executinginstructions stored in memory 130 or storage 160 or otherwise processingdata. As such, the processor may include a microprocessor,microcontroller, graphics processing unit (GPU), field programmable gatearray (FPGA), application-specific integrated circuit (ASIC), or othersimilar devices. The processor may be implemented as a secure processoror may include both a secure processor and unsecure processor.

The memory 130 may include various memories such as, for example L1, L2,or L3 cache or system memory. As such, the memory 130 may include staticrandom-access memory (SRAM), dynamic RAM (DRAM), flash memory, read onlymemory (ROM), or other similar memory devices.

The user interface 140 may include one or more devices for enablingcommunication with a user as needed. For example, the user interface 140may include a display, a touch interface, a mouse, and/or a keyboard forreceiving user commands. In some embodiments, the user interface 140 mayinclude a command line interface or graphical user interface that may bepresented to a remote terminal via the network interface 150.

The network interface 150 may include one or more devices for enablingcommunication with other hardware devices. For example, the networkinterface 150 may include a network interface card (NIC) configured tocommunicate according to the Ethernet protocol or other communicationsprotocols, including wireless protocols. Additionally, the networkinterface 150 may implement a TCP/IP stack for communication accordingto the TCP/IP protocols. Various alternative or additional hardware orconfigurations for the network interface 150 will be apparent.

The storage 160 may include one or more machine-readable storage mediasuch as read-only memory (ROM), random-access memory (RAM), magneticdisk storage media, optical storage media, flash-memory devices, orsimilar storage media. In various embodiments, the storage 160 may storeinstructions for execution by the processor 120 or data upon with theprocessor 120 may operate. For example, the storage 160 may store a baseoperating system 161 for controlling various basic operations of thehardware 100. The storage 162 may include instructions for implementingmasked polynomial decomposition by using the functionsSecDecomposeA2Apow2, SecDecomposeA2Bpow2, and SecDecomposeA2Anotpow2described above.

It will be apparent that various information described as stored in thestorage 160 may be additionally or alternatively stored in the memory130. In this respect, the memory 130 may also be considered toconstitute a “storage device” and the storage 160 may be considered a“memory.” Various other arrangements will be apparent. Further, thememory 130 and storage 160 may both be considered to be “non-transitorymachine-readable media.” As used herein, the term “non-transitory” willbe understood to exclude transitory signals but to include all forms ofstorage, including both volatile and non-volatile memories.

While the host device 100 is shown as including one of each describedcomponent, the various components may be duplicated in variousembodiments. For example, the processor 120 may include multiplemicroprocessors that are configured to independently execute the methodsdescribed herein or are configured to perform steps or subroutines ofthe methods described herein such that the multiple processors cooperateto achieve the functionality described herein. Further, where the device100 is implemented in a cloud computing system, the various hardwarecomponents may belong to separate physical systems. For example, theprocessor 120 may include a first processor in a first server and asecond processor in a second server.

As used herein, the term “non-transitory machine-readable storagemedium” will be understood to exclude a transitory propagation signalbut to include all forms of volatile and non-volatile memory. Whensoftware is implemented on a processor, the combination of software andprocessor becomes a single specific machine. Although the variousembodiments have been described in detail, it should be understood thatthe invention is capable of other embodiments and its details arecapable of modifications in various obvious respects.

Because the data processing implementing the present invention is, forthe most part, composed of electronic components and circuits known tothose skilled in the art, circuit details will not be explained in anygreater extent than that considered necessary as illustrated above, forthe understanding and appreciation of the underlying concepts of thepresent invention and in order not to obfuscate or distract from theteachings of the present invention.

Although the invention is described herein with reference to specificembodiments, various modifications and changes can be made withoutdeparting from the scope of the present invention as set forth in theclaims below. Accordingly, the specification and figures are to beregarded in an illustrative rather than a restrictive sense, and allsuch modifications are intended to be included within the scope of thepresent invention. Any benefits, advantages, or solutions to problemsthat are described herein with regard to specific embodiments are notintended to be construed as a critical, required, or essential featureor element of any or all the claims.

Furthermore, the terms “a” or “an,” as used herein, are defined as oneor more than one. Also, the use of introductory phrases such as “atleast one” and “one or more” in the claims should not be construed toimply that the introduction of another claim element by the indefinitearticles “a” or “an” limits any particular claim containing suchintroduced claim element to inventions containing only one such element,even when the same claim includes the introductory phrases “one or more”or “at least one” and indefinite articles such as “a” or “an.” The sameholds true for the use of definite articles.

Unless stated otherwise, terms such as “first” and “second” are used toarbitrarily distinguish between the elements such terms describe. Thus,these terms are not necessarily intended to indicate temporal or otherprioritization of such elements.

Any combination of specific software running on a processor to implementthe embodiments of the invention, constitute a specific dedicatedmachine.

It should be appreciated by those skilled in the art that any blockdiagrams herein represent conceptual views of illustrative circuitryembodying the principles of the invention.

What is claimed is:
 1. A data processing system comprising instructionsembodied in a non-transitory computer readable medium, the instructionsfor a cryptographic operation including a masked decomposition of apolynomial a having n_(s) arithmetic shares into a high part a₁ and alow part a₀ for lattice-based cryptography in a processor, theinstructions, comprising: performing a rounded Euclidian division of thepolynomial a by a base α to compute t^((⋅)B); extracting Boolean sharesa₁ ^((⋅)B) from n low bits of t by performing an arithmetic share toBoolean share (A2B) conversion on t^((⋅)A) and performing an AND withζ−1, where ζ=−α⁻¹ is a power of 2; unmasking a₁ by combining Booleanshares of a₁ ^((⋅)B); calculating arithmetic shares a₀ ^((⋅)A) of thelow part a₀; and performing a cryptographic function using a₁ and a₀^((⋅)A).
 2. The data processing system of claim 1, wherein performing arounded Euclidian division of the polynomial a by a base α to computet^((⋅)A) includes adding α/2 to a^((⋅)A) and dividing by α.
 3. The dataprocessing system of claim 2, wherein performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A) includescalculating:t ^((⋅)A) =a ^((⋅)A)+γ; andt ^((⋅)A)=α⁻¹ ×t ^((⋅)A)−(q mod ζ), where γ=α/2 and q is a primemodulus.
 4. The data processing system of claim 1, wherein calculatingarithmetic shares a₀ ^((⋅)A) of the low part a₀ includes: calculatingu^((⋅)A) by subtracting a₁ from t^((⋅)A) and adding q mod ζ, where q isa prime modulus; and multiplying u^((⋅)A) by α and then subtracting α/2.5. A data processing system comprising instructions embodied in anon-transitory computer readable medium, the instructions for acryptographic operation including a masked decomposition of a polynomiala having n_(s) arithmetic shares into a high part a₁ and a low part a₀for lattice-based cryptography in a processor, the instructions,comprising: performing a rounded Euclidian division of the polynomial aby a base α to compute t^((⋅)A); extracting Boolean shares a₁ ^((⋅)B)from n low bits of t by performing an arithmetic share to Boolean share(A2B) conversion on t^((⋅)A) and performing an AND with ζ−1, whereζ=−α⁻¹ is a power of 2; unmasking a₁ by combining Boolean shares of a₁^((⋅)B); calculating the Boolean shares a₀ ^((⋅)B) of the low part a₀;and performing a cryptographic function using a₁ and a₀ ^((⋅)B).
 6. Thedata processing system of claim 5, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes adding α/2 to a^((⋅)A) and dividing by α.
 7. The dataprocessing system of claim 6, wherein performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A) includescalculating:t ^((⋅)A) =a ^((⋅)A)+γ; andt ^((⋅)A)=α⁻¹ ×t ^((⋅)A)−(q mod ζ), where γ=α/2 and q is a primemodulus.
 8. The data processing system of claim 5, wherein calculatingthe Boolean shares a₀ ^((⋅)B) of the low part a₀ includes: shiftingt^((⋅)B) n bits to the right, where n is a number of bits in ζ; andcalculatinga ₀ ^((⋅)B)=SecAdd(((γ+(q mod ζ))^((⋅)B) ,¬t ^((⋅)B)) where γ=α/2 and qis a prime modulus.
 9. A data processing system comprising instructionsembodied in a non-transitory computer readable medium, the instructionsfor a cryptographic operation including a masked decomposition of apolynomial a having n_(s) arithmetic shares into a high part a₁ and alow part a₀ for lattice-based cryptography in a processor, theinstructions, comprising: performing a rounded Euclidian division of thepolynomial a by a base α to compute t^((⋅)A); extracting Boolean sharesa₁ ^((⋅)B) from n low bits of t by performing an arithmetic share toBoolean share (A2B) conversion on t^((⋅)A) to produce t^((⋅)B) andperforming a Boolean share to arithmetic share (B2A) conversion ont^((⋅)B), where ζ=−α⁻¹; unmasking a₁ by combining arithmetic shares ofa₁ ^((⋅)A); calculating the arithmetic shares a₀ ^((⋅)A) of the low parta₀; and performing a cryptographic function using a₁ and a₀ ^((⋅)A). 10.The data processing system of claim 9, wherein performing a roundedEuclidian division of the polynomial a by a base α to compute t^((⋅)A)includes adding α/2 to α^((⋅)A) and dividing by α.
 11. The dataprocessing system of claim 10, wherein performing a rounded Euclidiandivision of the polynomial a by a base α to compute t^((⋅)A) includescalculating:t ^((⋅)A) =a ^((⋅)A)+γ; andt ^((⋅)A)=α⁻¹ ×t ^((⋅)A)−(q mod ζ), where γ=α/2 and q is a primemodulus.
 12. The data processing system of claim 9, wherein calculatingthe arithmetic shares a₀ ^((⋅)A) of the low part a₀ includes:calculating u^((⋅)A) by subtracting a₁ from t^((⋅)A) and adding q mod ζ,where q is a prime modulus; and multiplying u^((⋅)A) by α and thensubtracting α/2.